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Abstract 

Abstraction-Carrying Code (ACC) has recently been proposed as a framework for mobile 
code safety in which the code supplier provides a program together with an abstraction 
(or abstract model of the program) whose validity entails compliance with a predefined 
safety policy. The abstraction plays thus the role of safety certificate and its generation 
is carried out automatically by a fixpoint analyzer. The advantage of providing a (fix- 
point) abstraction to the code consumer is that its validity is checked in a single pass 
(i.e., one iteration) of an abstract interpretation-based checker. A main challenge to make 
ACC useful in practice is to reduce the size of certificates as much as possible while at 
the same time not increasing checking time. The intuitive idea is to only include in the 
certificate information that the checker is unable to reproduce without iterating. We in- 
troduce the notion of reduced certificate which characterizes the subset of the abstraction 
which a checker needs in order to validate (and re-construct) the full certificate in a single 
pass. Based on this notion, we instrument a generic analysis algorithm with the necessary 
extensions in order to identify the information relevant to the checker. Interestingly, the 
fact that the reduced certificate omits (parts of) the abstraction has implications in the 
design of the checker. We provide the sufficient conditions which allow us to ensure that 
1) if the checker succeeds in validating the certificate, then the certificate is valid for the 
program (correctness) and 2) the checker will succeed for any reduced certificate which is 
valid (completeness). Our approach has been implemented and benchmarked within the 
CiaoPP system. The experimental results show that our proposal is able to greatly reduce 
the size of certificates in practice. 

To appear in Theory and Practice of Logic Programming (TPLP). 

KEYWORDS: Proof-Carrying Code. Abstraction-Carrying Code. Static Analysis. Re- 
duced Certificates. 



* A preliminary version of this work appeared in the Proceedings of ICLP'06 (Albert et al. 2006). 



1 Introduction 

Pro of- Carrying Code (PCC) (Necu la 1997|) is a general framework for mobile code 
safety which proposes to associate safety information in the form of a certificate 
to programs. The certificate (or proof) is created at compile time by the certifier 
on the code supplier side, and it is packaged along with the code. The consumer 
which receives or downloads the (untrusted) code+certificate package can then run 
a checker which by an efficient inspection of the code and the certificate can verify 
the validity of the certificate and thus compliance with the safety policy. The key 
benefit of this "certificate-based" approach to mobile code safety is that the task 
of the consumer is reduced from the level of proving to the level of checking, a 
procedure that should be much simpler, efficient, and automatic than generating 
the original certificate. 

Abstraction-Carrying Code (ACC) ([Albert et al. 20051 lAlbert et al. 2008j) has 
been recently proposed as an enabling technology for PCC in which an abstrac- 
tion (or abstract model of the program) plays the role of certificate. An important 
feature of ACC is that not only the checking, but also the generation of the ab- 
straction, is carried out automatically by a fixpoint analyzer. In this article we will 
consider analyzers which construct a program analysis graph which is interpreted as 
an abstraction of the (possibly infinite) set of states explored by the concrete exe- 
cution. To capture the different graph traversal strategies used in different fixpoint 
algorithms, we use the generic description of ( jHerme ncgildo "it al. 2000] ), which 
generalizes the algorithms used in state-of-the-art analysis engines. 

Essentially, the certification/analysis carried out by the supplier is an iterative 
process which repeatedly traverses the analysis graph until a fixpoint is reached. 
The analysis information inferred for each call which appears during the (multiple) 
graph traversals is stored in the answer table ( Hermcncgildo et al. 2000| . After each 



iteration (or graph traversal) , if the answer computed for a certain call is different 
from the one previously stored in the answer table, both answers are combined (by 
computing their lub) and the result is used 1) to update the table, and 2) to launch 
the recomputation of those calls whose answer depends on the answer currently 
computed. In the original ACC framework, the final full answer table constitutes 
the certificate. A main idea is that, since this certificate contains the fixpoint, a 
single pass over the analysis graph is sufficient to validate such certificate on the 
consumer side. 

One of the main challenges for the practical uptake of ACC (and related methods) 
is to produce certificates which are reasonably small. This is important since the 
certificate is transmitted together with the untrusted code and, hence, reducing its 
size will presumably contribute to a smaller transmission time -very relevant for in- 
stance under limited bandwidth and/or expensive network connectivity conditions. 
Also, this reduces the storage cost for the certificate. Nevertheless, a main con- 
cern when reducing the size of the certificate is that checking time is not increased 
(among other reasons because pervasive and embedded systems also suffer typically 
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from limited computing -and power- resources). In principle, the consumer could 
use an analyzer for the purpose of generating the whole fixpoint from scratch, which 
is still feasible as analysis is automatic. However, this would defeat one of the main 
purposes of ACC, which is to reduce checking time. The objective of this work is 
to characterize the smallest subset of the abstraction which must be sent within a 
certificate -and which still guarantees a single pass checking process- and to design 
an ACC scheme which generates and validates such reduced certificates. The main 
contributions of this article are: 

1. The notion of reduced certificate which characterizes the subset of the abstrac- 
tion which, for a given analysis graph traversal strategy, the checker needs in 
order to validate (and re-construct) the full certificate in a single pass. 

2. An instrumentation of the generic abstract interpretation-based analysis al- 
gorithm of ([Hermcnegildo et al. 2000] ) with the necessary extensions in order 
to identify relevant information to the checker. 

3. A checker for reduced certificates which is correct, i.e., if the checker succeeds 
in validating the certificate, then the certificate is valid for the program. 

4. Sufficient conditions for ensuring completeness of the checking process. Con- 
cretely, if the checker uses the same strategy as the analyzer then our proposed 
checker will succeed for any reduced certificate which is valid. 

5. An experimental evaluation of the effect of our approach on the CiaoPP system 
( |Hermenegildo et al. 2005"] ) , the abstract interpretation-based preprocessor of 
the Ciao multi-paradigm (Constraint) Logic Programming system. The ex- 
perimental results show that the certificate can be greatly reduced (by a factor 
of 3.35) with no increase in checking time. 

Both the ACC framework and our work here are applied at the source level. In 
contrast, in existing PCC frameworks, the code supplier typically packages the cer- 
tificate with the object code rather than with the source code (both are untrusted) . 
Nevertheless, our choice of making our presentation at the source level is without 
loss of generality because both the original ideas in the ACC approach and those in 
our current proposal can also be applied directly to bytecode. Indeed, a good num- 
ber of abstract interpretation-based analyses have been proposed in the literature 
for bytecode and machine code, most of which compute a fixpoint during analysis 
which can be reduced using the general principle of our proposal. For instance, in 
recent work, the concrete CLP verifier used in the original ACC implementation 
has itself been shown to be applicable without modification also to Java bytecode 
via a transformational approach, based on partial evaluation (Alb ert et al. 2007j) or 
via direct transformation flMendez-Lojo et al. 2007a[ ) using standard tools such as 
Soot (jVallee-Rai et al. 1999]) . Furthermore, in QMendez-Lojo et al. 2007b||Mendez-Lojo et al. 2007a[ ) 
a fixpoint-based analysis framework has been developed specifically for Java byte- 
code which is essentially equivalent to that used in the ACC proposal and to the one 
that we will apply in this work on the producer side to perform the analysis and ver- 
ification. This supports the direct applicability of our approach to bytecode-based 
program representations and, in general, to other languages and paradigms. 

The rest of the article is organized as follows. The following section presents a 
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general view of ACC. Section [3] gives a brief overview of our method by means 
of a simple example. Section [4] recalls the certification process performed by the 
code supplier and illustrates it with a running example. Section [5] characterizes the 
notion of reduced certificate and instruments a generic certifier for its generation. 
Section[6]presents a generic checker for reduced certificates together with correctness 
and completeness results. Finally, Section[7]discusses some experimental results and 
related work. 



2 A General View of Abstraction-Carrying Code 

We assume the reader familiar with abstract interpretation (see (jCousot and Cousot 1977]) ) 



and (Constraint) Logic Programming (C)LP (see, e.g., ( |Marriot and Stuckey 1998] ) 
and dLloyd 1987D ). 

An abstract interpretation-based certifier is a function certifier : Prog x ADom x 
APol i— > ACert which for a given program P e Prog, an abstract domain (D a , C) G 
ADom and a safety policy I a G APol generates a certificate Cert a £ ACert, 
by using an abstract interpreter for D a , which entails that P satisfies I a . In the 
following, we denote that I a and Cert a are specifications given as abstract semantic 
values of D a by using the same a. The essential idea in the certification process 
carried out in ACC is that a fixpoint static analyzer is used to automatically infer 
an abstract model (or simply abstraction) about the mobile code which can then 
be used to prove that the code is safe w.r.t. the given policy in a straightforward 
way. The basics for defining the abstract interpretation-based certifiers in ACC are 
summarized in the following four points and equations. 

Approximation. We consider a description (or abstract) domain (D a , C) e ADom 
and its corresponding concrete domain (2 D ,C), both with a complete lattice 
structure. Description (or abstract) values and sets of concrete values are re- 
lated by an abstraction function a : 2 D — ¥ D a , and a concretization function 
7 : D a — > 2 D . The pair (a, 7) forms a Galois connection. The concrete and 
abstract domains must be related in such a way that the following condition 
holds ()Cousot and Cousot 1977|) : 

Vxe2 D ,VyeD a : (a(x) Ei/)^(iC 7 (y)) 

In general C is induced by C and a. Similarly, the operations of least upper bound 
(U) and greatest lower bound (l~l) mimic those of 2 D in a precise sense. 

Abstraction generation. We consider the class of fixpoint semantics in which a 
(monotonic) semantic operator, Sp, is associated to each program P. The mean- 
ing of the program, [P], is defined as the least fixed point of the Sp operator, 
i.e., [P]— lfp(Sp). If Sp is continuous, the least fixed point is the limit of an it- 
erative process involving at most uj applications of Sp starting from the bottom 
element of the lattice. Using abstract interpretation, we can use an operator Sp 
which works in the abstract domain and which is the abstract counterpart of 
Sp. This operator induces the abstract meaning of the program, which we refer 
to as [P]] a . Now, again, starting from the bottom element of the lattice we can 
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obtain the least fixpoint of Sp, denoted lfp(5p), and we define |[P] a =lfp(5p). 
Correctness of analysis (jCousot and Cousot 1977[) ensures that [P] a safely ap- 
proximates |P], i.e., |P] € 7([P]| ). In actual analyzers, it is often the case that 
the analysis computes a post-fixpoint of Sp, which we refer to as Cert a , instead 
of the least fixpoint. The reason for this is that computing the least fixpoint may 
require a too large (even infinite) number of iterations. An analyzer is a function 
analyzer : Prog x ADom i-> ACert such that: 

analyzer(P, D a )=Cert a A Sf,(Cert a )^Cert a (1) 

Since \P\ a E Cert a , Cert a is a safe approximation of |PJ. 

Verification Condition. Let Cert a be a safe approximation of [Pj. If an ab- 
stract safety specification I a can be proved w.r.t. Cert a , then P satisfies the 
safety policy and Cert a is a valid certificate: 

Cert a is a uaZzrf certificate for P w.r.t. 7 a if Cert a C 7 Q (2) 

Certification. Together, Equations {1} and {2J define a certifier which provides 
program fixpoints, Cert a , as certificates which entail a given safety policy, i.e., 
by taking Cert a = analyzer(P, D a ). 

The second main idea in ACC is that a simple, easy-to-trust abstract interpretation- 
based checker verifies the validity of the abstraction on the mobile code. The checker 
is defined as a specialized abstract interpreter whose key characteristic is that it 
does not need to iterate in order to reach a fixpoint (in contrast to standard ana- 
lyzers). The basics for defining the abstract interpretation-based checkers in ACC 
are summarized in the following two points and equations. 

Checking. If a certificate Cert a is a fixpoint of Sp, then Sp(Cert a ) = Cert a . 
Thus, a checker is a function checker : Prog x ADom x ACert i-» bool which for a 
program P^Prog, an abstract domain D a ^ADom and a certificate Cert a &ACert 
checks whether Cert a is a fixpoint of Sp or not: 

checker(P, D a , Cert a ) returns true iff (Sp(Cert a ) = Cert a ) (3) 

Verification Condition Regeneration. To retain the safety guarantees, the con- 
sumer must regenerate a trustworthy verification condition -Equation[2]- and use 
the incoming certificate to test for adherence to the safety policy. 

P is trusted iff Cert a C I a (4) 

Therefore, the general idea in ACC is that, while analysis -Equation (fT])- is an 
iterative process, which may traverse (parts of) the abstraction more than once 
until the fixpoint is reached, checking -Equation (J3j) — is guaranteed to be done in 
a single pass over the abstraction. This characterization of checking ensures that 
the task performed by the consumers is indeed strictly more efficient than the 
certification carried out by the producers, as shown in ([Albert et al. 2005]) . 
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3 An Informal Account of our Method 

In this section we provide an informal account of the idea of reduced certificate 
within the ACC framework by means of a very simple example. 

Example 3.1 

Consider the following program, the simple abstract domain _L C int C real C 
term that we will use in all our examples, and the initial calling pattern S a — 
{q(X):(term)} which indicates that q can be called with any term as argument: 

q(X) :- p(X) . p(X) :- X = 1.0. 

p(X) :- X = 1. 

A (top-down) analyzer for logic programs would start the analysis of q(term) which 
in turn requires the analysis of p(term) and, as a result, the following fixpoint can 
be inferred: Cert a — {q(X):(term) i-> (real), p(X):(term) H> (real)}. This gives us 
a safe approximation of the result of executing q(X). In particular, it says that 
we obtain a real number as a result of executing q(X). Observe that the fixpoint 
is sound but possibly inaccurate since when only the second rule defining p(X) is 
executed, we would obtain an integer number. 

Given a safety policy, the next step in any approach to PCC is to verify that 
Cert a entails such policy. For instance, if the safety policy specifies that I a = 
{q(X):(term) M> (term)}, then clearly Cert a C I a holds and, hence, Cert a can be 
used as a certificate. Similarly, a safety policy I' a — {q(X):(term) n> (real)} is 
entailed by the certificate, while 1% = {q(X):(term) i-> (int)} is not. 

The next important idea in ACC is that, given a valid certificate Cert ai a single 
pass of a static analyzer over it must not change the result and, hence, this way 
Cert a can be validated. Observe that when analyzing the second rule of p(X) the 
inferred information X i— > int is lubbed with X i— > real which we have in the 
certificate and, hence, the fixpoint does not change. Therefore, the checker can be 
implemented as a non-iterating single-pass analyzer over the certificate. If the result 
of applying the checker to Cert a yields a result that is different from Cert a an error 
is issued. Once the checker has verified that Cert a is a fixpoint (and thus it safely 
approximates the program semantics) the only thing left is to verify that Cert a 
entails I a , thus ensuring that the validated certificate enforces the safety policy, 
exactly as the certifier does. 

We now turn to the key idea of reduced certificates in ACC: the observation 
that any information in the certificate that the checker is able to reconstruct by 
itself in a single-pass does not need to be included in the certificate. For example, if 
generation of the certificate does not require iteration, then no information needs to 
be included in the certificate, since by performing the same steps as the generator 
the checker will not iterate. If the generator does need to iterate, then the challenge 
is to find the minimal amount of information that needs to be included in Cert a 
to avoid such iteration in the checker. 

Whether a generator requires iteration depends on the strategy used when com- 
puting the fixpoint as well as on the domain and the program itself (presence of 
loops and recursions, multivariance, etc.). In fact, much work has been done in 
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order to devise optimized strategies to reduce as much as possible iterations dur- 



ing analysis. As mentioned before, ( Hermenegildo et al. 2000), which will be our 



starting point, presents a parametric algorithm that allows capturing a large class 
of such strategies. An important observation is that whether the checker can avoid 
iteration is controlled by the same factors as in the generator, modified only by the 
effects of the information included in the (reduced) certificate, that we would like 
to be minimal. 

As an (oversimplied) example in order to explain this idea, let us consider two 
possible fixpoint strategies, each one used equally in both the analyzer (generator) 
and the checker: 

(1) a strategy which first analyzes the first rule for p(X) and then the second one, 
and 

(2) a strategy which analyzes the rules in the opposite order than (1). 

Assume also that the analyzer has the simple iteration rule that as soon as an answer 
changes during analysis then analysis is restarted at the top (these strategies are 
really too simple and no practical analyzer would really iterate on this example, 
but they are useful for illustration here -the general issue of strategies will become 
clear later in the paper). 

In (1), the answer X i-> real is inferred after the checking of the first rule. Then, 
the second rule is analyzed which leads to the answer X i-> int that is lubbed with 
the previous one yielding X i— > real. Hence, in a single pass over the program the 
fixpoint is reached. Therefore, with this strategy X i-> real can be reconstructed 
by the checker without iterating and should not be included in the certificate. 

However, with strategy (2) we first obtain the answer X i— > int. Then, after the 
analysis of the first rule, X i— > real is inferred. When lubbing it with the previous 
value, X i— > int, the answer obtained is X h> real. Since the answer has changed 
the analyzer starts a new iteration in which it reanalyzes the second rule with the 
new answer X H >• real. Since now nothing changes in this iteration the fixpoint is 
reached. 

The key idea is that, if strategy (2) is used, then more than one iteration is 
needed to reach the fixpoint. Hence the certificate cannot be empty and instead it 
has to include (some of) the analysis information. The conclusion is that the notion 
of reduced certificate is strongly related to the strategy used during analysis and 
checking. □ 

The remainder of the article will formalize and discuss in detail each of the above 
steps and issues. 



4 Generation of Certificates in Abstraction-Carrying Code 

This section recalls ACC and the notion of full certificate in the context of (C)LP 
([Albert et al. 20 05). This programming paradigm offers a good number of advan- 
tages for ACC, an important one being the maturity and sophistication of the 
analysis tools available for it. It is also a non-trivial case in many ways, including 
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the fact that logic variables and incomplete data structures essentially represent re- 
spectively pointers and structures containing pointers (see also the arguments and 
pointers to literature in Section [T] which provide evidence that our approach is ap- 
plicable essentially directly to programs in other programming paradigms, including 
their bytecode representations). 

Very briefly, terms are constructed from variables i£V, functors (e.g., /) and 
predicates (e.g., p). We denote by {xi/ti, . . . , x n /t n } the substitution a, where Xi ^ 
Xj, if i ^ j, and U are terms. A renaming is a substitution p for which there 
exists the inverse p~ l such that pp^ 1 = p^ 1 p = id. A constraint is a conjunction 
of expressions built from predefined predicates (such as inequalities over the reals) 
whose arguments are constructed using predefined functions (such as real addition) . 
An atom has the form p(t%, ...,t n ) where p is a predicate symbol and t$ are terms. 
A literal is either an atom or a constraint. A rule is of the form H:-D where H, 
the head, is an atom and D, the body, is a possibly empty finite sequence of literals. 
A constraint logic program P £ Prog, or program, is a finite set of rules. Program 
rules are assumed to be normalized: only distinct variables are allowed to occur as 
arguments to atoms. Furthermore, we require that each rule defining a predicate p 
has identical sequence of variables x Pl , . . . x Pn in the head atom, i.e., p(x Pl , . . . x Pn ). 
We call this the base form of p. This is not restrictive since programs can always 
be normalized. 



4-1 The Analysis Algorithm 



Algorithm [T] has been presented in ( Hermenegildo et al. 2000 ) as a generic de- 
scription of a fixpoint algorithm which generalizes those used in state-of-the-art 

analysis engines, such as the one in CiaoPP (Hermenegil do et al. 2005"] ), PLAI ( Muthukumar and Hermenegildo 199/ 
Ide la Banda et al. 1996|) . GAIA ( |Le Charlier and Van Hentenryck 1994| , and the 
CLP(7?.) analyzer (Kelly et al. 1998). It has the description domain D a (and func- 
tions on this domain) as parameters. Different domains give analyzers which provide 
different kinds of information and degrees of accuracy. In order to analyze a pro- 
gram, traditional (goal dependent) abstract interpreters for (C)LP programs receive 
as input, in addition to the program P and the abstract domain D a , a set S a C AA- 
tom of Abstract Atoms (or call patterns). Such call patterns are pairs of the form 
A : CP where A is a procedure descriptor and CP is an abstract substitution (i.e., 
a condition of the run-time bindings) of A expressed as CP £ D a . For brevity, we 
sometimes omit the subscript a in the algorithms. The analyzer of Algorithm [TJ 
Analyze_f, constructs an and-or graph (Bruynooghe 1991) (or analysis graph) for 
S a which is an abstraction of the (possibly infinite) set of (possibly infinite) exe- 
cution paths (and-or trees) explored by the concrete execution of the initial calls 
described by S a in P. Let Sp be the abstract semantics of the program for the call 
patterns S a defined in ( |Bruynoog hc 1991). Following the notation in Section[2j the 
analysis graph -denoted as [P]] a - corresponds to (or safely approximates) lfp(Sp). 
The program analysis graph is implicitly represented in the algorithm by means 
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Algorithm 1 Generic Analyzer for Abstraction-Carrying Code 



Initialization of global data structures: DAT=AT=ty 

1: function Analyze_f(5 , q C AAtom, SI G QHS) 

2: for A : CP G S a do 

3: add_event(neuica/i(j4 : CP), SI); 

4: while _E=next_event(£!) do 

5: if E=newcall(A : CP) then new_CALL_pattern(A : CP, SI); 

6: else if E=updated(A : CP) then add_dependent_rules(A : CP, SI); 

7: else if E=arc(R) then PROCESS_ARC(P, SI); 

8: return AT; 

9: procedure new_CALl_pattern(A : CP e AAtom, SI G QHS) 

10: for all rule A k : —Bk,i, ■ B ky „ k do 

11: CPo :=Aex£end(CP,vars(...,B k ,i, ...)); 

12: CPi := Arestrict(CP ,uars(B fc .i)); 

13: add_event(arc(A fe : CP =>• [CP ] 5fc,i : CPi),fi); 

14: add_answer_table(yl : CPh-s>_L); 

15: procedure PROCESS_ARC(ii"fc : CP => [CPi] B k ,i : CP 2 G Dep, SI G QHS) 

16: if Bk,i is not a constraint then 

17: add H k : CP => [CPi] Pfc,* : CP 2 to 7X4 T; 

18: W :=vars(H k ,B k!l ,...,B h ,n k ); 

19: CP 3 := GET_ANSWER(B M : CP 2 , CPi,W,Sl); 

20: if CP 3 / 1 and i / n fc then 

21: CP4 := Arestrict(CP3,uars(B fcii+ i)); 

22: add_event( arc(H k : CP => [CP 3 ] P*, i+ i : CP 4 ),Sl); 

23: else if CP3 7^ _L and i=nfc then 

24: APi := Arestrict(CP3, vars(H k )); insert_answer_info(77 : CPo^-APi, SI); 

25: function GET_answer(L : CP 2 G AAtom, CP! G D Q , W C V, ft G Q77S) 

26: if L is a constraint then return Aadd(L, CPi); 

27: else AP := lookup_ANSWEr(L : CP 2 ,ft); APi := Aextend( J 4P , W); 

28: return Aconj(CPi, APi); 

29: function LOOKUP_ANSWER(A : CP G AAtom, SI G QHS) 

30: if there exists a renaming a s.t. <r(A : CP)*— >AP in AT then 

31: return a~ 1 (AP); 

32: else add_event(neuicaZ/(cr(A : CP)), SI) where a is renaming s.t. a (A) in base form; 

33: return _L; 

34: procedure insert_answer_info(H : CP^AP G Entry, SI G QHS) 

35: AP Q := LOOKUP_ANSWER(77 : CP); APi := Alub(AP, AP ) ; 

36: if AP / APi then 

37: add_answer_table((77 : CPn-APi); 

38: add_event(«prfaied(Jf : CP), SI); 

39: procedure ADD_dependent_rules(A : CP G AAtom, SI G Cffi>) 

40: for all arc of the form H k : CPo =>■ [CPi] B ki i : CP2 in graph where there exists 

renaming a s.t. A : CP=(B k ,i '■ CPi)a do 

41: add_event(arc(77 fe : CP [CPi] B M : CP 2 ),fi); 



of two global data structures, the answer table AT and the dependency arc table 
DAT, both initially empty as shown at the beginning of Algorithm [T0 



1 Given the information in these, it is straightforward to construct the graph and the associated 
program-point annotations. 



Definition 4^.1 {answer and dependency arc table) 

Let P € Prog be a program and D a an abstract domain. 

• An Answer Table (AT C Entry) for P and D Q is a set of entries of the form 
A : CP^AP € Entry where A : CP £ AAtom, A is always in base form and 
CP and AP are abstract substitutions in D a . 

• A Dependency Arc Table (DAT C Dep) for P and D a is a set of dependencies 
of the form A fe : CP =>■ [CPi] B k ,i : CP 2 6 Pep, where A fc :- B fc)1 , . . . , P fc ,„ 
is a program rule in P and CPo, CPi, CP2 are abstract substitutions in D a . 

Informally, an entry A : CP H> AP in AT should be interpreted as "the answer 
pattern for calls to A satisfying precondition (or call pattern) CP meets post- 
condition (or answer pattern), AP." Dependencies are used for efficiency. As we 
will explain later, Algorithm [1] finishes when there are no more events to be pro- 
cessed (function Analyze_f). This happens when the answer table AT reaches 
a fixpoint. Any entry A : CP n- AP in AT is generated by analyzing all rules 
associated to A (procedure new_call_pattern). Thus, if we have a rule of the 
form Ak :- Pfe,i, ■ ■ ■ , Pfe, n) we know that the answer for A depends on the answers 
for all literals in the body of the rule. We annotate this fact in DAT by means 
of the dependencies Ak : CP => [CPks-i] Pfe,i : CPk,i, i S {l,..n}, which mean 
that the answer for Ak : CP depends on the answer for Bk : i : CPk,%, also stored 
in AT. Then if during the analysis, the answer for Bk,i ■ CPk,i changes, the arc 
Ak : CP [CPks-i]Bk,i ■ CPk,i must be reprocessed in order to compute the 
"possibly" new answer for Ak : CP. This is to say that the rule for Ak has to be 
processed again starting from atom Bk.%- Thus, as we will see later, dependency 
arcs are used for forcing recomputation until a fixpoint is reached. The remaining 
part CPjc i—i is the program annotation just before Bk t i is reached and contains 
information about all variables in rule k. CPk.i-i is not really necessary, but is 
included for efficiency 

Intuitively, the analysis algorithm is a graph traversal algorithm which places 
entries in the answer table AT and dependency arc table DAT as new nodes and 
arcs in the program analysis graph are encountered. To capture the different graph 
traversal strategies used in different fixpoint algorithms, a prioritized event queue 
is used. We use Q € QHS to refer to a Queue Handling Strategy which a particular 
instance of the generic algorithm may use. Events are of three forms: 

• newcall(A : CP) which indicates that a new call pattern for literal A with 
abstract substitution CP has been encountered. 

• arc(Hk : _ =>■ [ _ ] -Bfc,i : -) which indicates that the rule with Hk as head 
needs to be (re)computed from the position k,i. 

• updated (A : CP) which indicates that the answer to call pattern A with 
abstract substitution CP has been changed in AT. 

The algorithm is defined in terms of five abstract operations on the domain D a : 

• Arestrict( CP, V) performs the abstract restriction of an abstract substitution 
CP to the set of variables in the set V. 
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• Aextend( CP, V) extends the abstract substitution CP to the variables in the 
set V. 

• Aadd(C, CP) performs the abstract operation of conjoining the actual con- 
straint C with the abstract substitution CP. 

• Aconj(CPi, CP2) performs the abstract conjunction of two abstract substitu- 
tions. 

• Alub(CPi, CP2) performs the abstract disjunction of two abstract substitu- 
tions. 

Apart from the parametric domain-dependent functions, the algorithm has several 
other undefined functions. The functions add_event and next_event respectively push 
an event to the priority queue and pop the event of highest priority, according to 
il. When an arc H k : CP [CP"} B kyl : CP is added to DAT, it replaces any 
other arc of the form Hk ■ CP =>■ [ _ ] B k i : _ (modulo renaming) in the table 
and the priority queue. Similarly when an entry Hk : CP n> AP is added to the 
AT (add_answer_table), it replaces any entry of the form H k ■ CP M> _ (modulo 
renaming). Note that the underscore (_) matches any description, and that there is 
at most one matching entry in DAT or AT at any time. 

More details on the algorithm can be found in ( |Hermenegildo et al. 2000{|Puebla and He rmenegildo 1996). 
Let us briefly explain its main procedures: 

• The algorithm centers around the processing of events on the priority queue, 
which repeatedly removes the highest priority event (Line and calls the 
appropriate event-handling function (l[5][7]). 

• The function NEW_CALL_PATTERN initiates processing of all the rules for the 
definition of the internal literal A, by adding arc events for each of the first 
literals of these rules (lJl3|). Initially, the answer for the call pattern is set to 
J_ (Id. 

• The procedure PROCESS_ARC performs the core of the analysis. It performs a 
single step of the left-to-right traversal of a rule body. 

— If the literal P>k,i is not a constraint fl TTBl . the arc is added to DAT 
(HE). 

— Atoms are processed by function GET_ANSWER: 

— Constraints are simply added to the current description fL I26p . 

— In the case of literals, the function lookup_ANSWER first looks up 
an answer for the given call pattern in AT (U'30\i and if it is not 
found, it places a newcall event (D32|). When it finds one, then this 
answer is extended to the variables in the rule the literal occurs 
in (I 127[) and conjoined with the current abstract substitution (IJ28JI . 
The resulting answer (I 119[) is either used to generate a new arc event 
to process the next literal in the rule, if Bk,i is not the last one (I J20]) : 
otherwise, the new answer is computed by insert _ANS\ver_info. 

• The part of the algorithm that is more relevant to the generation of reduced 
certificates is within insert _ANSWER_info. The new answer for the rule is 
combined with the current answer in the table (Ll35j). If the fixpoint for such 
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call has not been reached, then the corresponding entry in AT is updated 
with the combined answer (I|37|) and an updated event is added to the queue 
(I®. 

• The purpose of an updated event is that the function ADD_dependent_RULES 
(re)processes those calls which depend on the call pattern A : CP whose 
answer has been updated (I l40|) . This effect is achieved by adding the arc 
events for each of its dependencies (iHTj). The fact that dependency arcs 
contain information at the level of body literals, identified by a pair k, i, 
allows reprocessing only those rules for the predicate which depend on the 
updated pattern. Furthermore, those rules are reprocessed precisely from the 
body atom whose answer has been updated. If, instead, dependencies were 
kept at the level of rules, rules would need to be reprocessed always from 
the leftmost atom. Furthermore, if dependencies were kept at the level of 
predicates, all rules for a predicate would have to be reprocessed from the 
leftmost atom as soon as an answer pattern it depended on were updated. 

In the following section, we illustrate the algorithm by means of an example. 

4-2 Running Example 

Our running example is the program rectoy taken from (jRose 1998[) . We will use 
it to illustrate our algorithms and show that our approach improves on state-of- 
the-art techniques for reducing the size of certificates. Our approach can deal with 
the very wide class of properties for which abstract interpretation has been proved 
useful (for example in the context of LP this includes variable sharing, determi- 
nacy, non-failure, termination, term size, etc.). For brevity and concreteness, in all 
our examples abstract substitutions simply assign an abstract value in the simple 
domain introduced in Section |3] to each variable in a set V over which each such 
substitution ranges. We use term as the most general type (i.e., term corresponds 
to all possible terms). For brevity, variables whose regular type is term are often 
not shown in abstract substitutions. Also, when it is clear from the context, an ab- 
stract substitution for an atom p{x\, . . . , x n ) is shown as a tuple (ti, . . . , t n ), such 
that each value t\ indicates the type of Xi . The most general substitution T assigns 
term to all variables in V. The least general substitution _L assigns the empty set 
of values to each variable. 

Example ^.2 

Consider the Ciao version of procedure rectoy (|Rose 1998|) and the call pattern 
rectoy(N,M) : (int,term) which indicates that external calls to rectoy are per- 
formed with an integer value, int, in the first argument N: 

rectoy(N,M) :- N = 0, M = 0. 

rectoy(N,M) :- Nl is N-l, rectoy (Nl ,R) , M is Nl+R. 

We now briefly describe four main steps carried out in the analysis using some 
fl e QHS: 



12 



A. The initial event newca/Z(rectoy(N, M) : (int, term}) introduces the arcs A\ t i 
and ^2,1 in the queue, each one corresponds to the rules in the order above: 

Ai i = arc(rectoy(N,M) : (int, term) => [{N/int}] N=0 : {N/int}) 

A 2 .i = arc(rectoy(N,M) : (int, term) [{N/int}] Nl is N - 1 : {N/int}) 

The initial answer Ei = rectoy(N,M) : (int, term) M> _L is inserted in AT. 

B. Assume that assigned higher priority to Ax^. The procedure get .answer 
simply adds the constraint N=0 to the abstract substitution {N/int}. Upon 
return, as it is not the last body atom, the following arc event is generated: 

Ai i2 = arc(rectoy(N,M) : (int, term) [{N/int}] M=0 : {M/term}) 

Arc Ai t 2 is handled exactly as Ai t i and GET_ANSWER simply adds the con- 
straint M=0, returning {N/int, M/ int}. As it is the last atom in the body 
( U25|) . procedure insert_ANSWER_info computes Alub between J_ and the 
above answer and overwrites Ei with: 

E[ = rectoy(N,M) : (int, term) M- (int, int) 

Therefore, the event Ui = itp<iate<i(rectoy(N, M) : (int, term)) is introduced in 
the queue. Note that no dependency has been originated during the processing 
of this rule (as both body atoms are constraints). 

C. Now, fl can choose between the processing of Ui or ^2,1 • Let us assume that 
^2,1 has higher priority. For its processing, we have to assume that prede- 
fined functions "— ", "+" and "is" are dealt by the algorithm as standard 
constraints by just using the following information provided by the system: 

E2 = CisA + B: (int, int, term) h-> (int, int, int) 
E3 = C is A — B : (int, int, term) H> (int, int, int) 

where the three values in the abstract substitutions correspond to variables 
A, B, and C, in this order. In particular, after analyzing the subtraction with 
the initial call pattern, we infer that Nl is of type int and no dependency is 
asserted. Next, the arc: 

A2.2 = arc(rectoy(N, M) : (int, term) =>■ 

[{N/int, Nl/int}] rectoy(Nl,R) : (int, term)) 

is introduced in the queue and the corresponding dependency is stored in 
DAT. The call to GET_ANSWER returns the current answer E[. Then, we use 
this answer as call pattern to process the last addition by creating a new arc 

^2,3- 

A2.3 = arc(rectoy(N, M) : (int, term) 

[{N/int, Nl/int, R/int}] M is Nl + R : {Nl/int, R/int}) 

Clearly, the processing of ^2,3 does not change the final answer E[. Hence, 
no more updates are introduced in the queue. 

D. Finally, we have to process the event Ui introduced in step B to which 17 
has assigned lowest priority. The procedure ADD_DEPENDENT_RULES finds the 
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Fig. 1. Analysis Graph for our Running Example 

dependency corresponding to arc ^2,2 and inserts it in the queue. This re- 
launches an arc identical to ^2,2- This in turn launches an arc identical to 
^4.2,3- However, the reprocessing does not change the fixpoint result E[ and 
the analysis terminates computing as answer table the entry E[ and as unique 
dependency arc A2.2- 

Figure [T] shows the analysis graph for the analysis above. The graph has two sorts 
of nodes. Those which correspond to atoms are called "OR-nodes." An OR-node 
of the form CP A AP is interpreted as: the answer for the call pattern A : CP is AP. 
For instance, the OR-node 



indicates that, when the atom rectoy(Nl, R) is called with the abstract substitution 
(int,term), the answer computed is (int,int). As mentioned before, variables 
whose type is term will often not be shown in what follows. Those nodes which 
correspond to rules are called "AND-nodes." In Figure [TJ they appear within a 
dotted box and contain the head of the corresponding clause. Each AND-node has 
as children as many OR-nodes as there are atoms in the body. If a child OR-node 
is already in the tree, it is not expanded any further and the currently available 
answer is used. For instance, the analysis graph in the figure at hand contains two 
occurrences of the abstract atom rectoy(N,M) : (int, term) (modulo renaming), 
but only one of them (the root) has been expanded. This is depicted by a dashed 
arrow from the non-expanded occurrence to the expanded one. 

The answer table AT contains entries for the different OR-nodes which appear 
in the graph. In our example AT contains E[ associated to the (root) OR-node 



{Nl/int} 



rectoy(Nl,R){ N1 / int ' R / int > 
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discussed above. Dependencies in DAT indicate direct relations among OR- nodes. 
An OR-node Ap ■ CP f depends on another OR- node At : CPt iff the OR- 
node At ■ CPt appears in the body of some clause for Ap : CPf- For instance, 
the dependency ^2,2 indicates that the OR-node rectoy(Nl,R) : (int,term) is 
used in the OR-node rectoy(N,M) : (int,term). Thus, if the answer pattern for 
rectoy(Nl,R) : (int,term) is ever updated, then we must reprocess the OR-node 
rectoy(N,M) : (int,term). □ 

4-3 Full Certificate 

The following definition corresponds to the essential idea in the ACC framework 
-Equations (p} and ((2])- of using a static analyzer to generate the certificates. The 
analyzer corresponds to Algorithm [T] and the certificate is the full answer table. 

Definition 4-3 {full certificate) 

We define function Certifier_f : Prog x ADom x 2 AAtom x APol x QHS !->• ACert 
which takes P £ Prog, D a e ADom, S a C AAtom, I a 6 APol, fl £ QHS and 
returns as full certificate, FCert 6 ACert, the answer table computed by Analy- 
ze_f(S* q , Q) for P in D a iff FCert C I a . 

If the inclusion does not hold, we do not have a certificate. This can happen either 
because the program does not satisfy the policy or because the analyzer is not 
precise enough. In the latter case, a solution is to try analyzing with a more precise 
(and generally more expensive) abstract domain. In the former case (the program 
does not satisfy the policy), this can be due to two possible reasons. A first one is 
that we have formalized a policy which is unnecessarily restrictive, in which case 
the solution is to weaken it. The other possible reason is that the program actually 
violates the policy, either inadvertently or on purpose. In such a case there is of 
course no way a certificate can be found for such program and policy. 

Example 4-4 

Consider the safety policy expressed by the following specification I a : rectoy(N, M) : 
(int, term)H'(int, real). The certifier in Definition 14.31 returns as valid certificate 
the single entry E[. Clearly E[ C I a since _L C int C real C term. □ 

5 Abstraction-Carrying Code with Reduced Certificates 

As already mentioned in Section [TJ in the ACC framework, since this certificate 
contains the fixpoint, a single pass over the analysis graph is sufficient to validate 
such certificate on the consumer side. The key observation in order to reduce the 
size of certificates within the ACC framework is that certain entries in a certifi- 
cate may be irrelevant, in the sense that the checker is able to reproduce them 
by itself in a single pass. The notion of relevance is directly related to the idea 
of recomputation in the program analysis graph. Intuitively, given an entry in the 
answer table A : CP \-> AP, its fixpoint may have been computed in several iter- 
ations from _L, APq, AP%, . . . until AP. For each change in the answer, an event 
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updated(A : CP) is generated during the analysis. The above entry is relevant in 
a certificate (under some strategy) when its updates launch the recomputation of 
other arcs in the graph which depend on A : CP (i.e., there is a dependency from 
it in the table). Thus, unless A : CP H» AP is included in the (reduced) certificate, 
a single-pass checker which uses the same strategy as the code producer will not 
be able to validate the certificate. Section [5~T1 identifies redundant updates which 
should not be considered. In Section 15.21 we characterize formally the notion of 
reduced certificate containing only relevant answers. Then, in Section 15. 3[ we in- 
strument an analysis algorithm to identify relevant answers and define a certifier 
based on the instrumented analyzer which generates reduced certificates. 

5.1 Identifying Redundant Updates 

According to the above intuition, we are interested in determining when an entry 
in the answer table has been "updated" during the analysis and such changes affect 
other entries. There is a special kind of updated events which can be directly consid- 
ered irrelevant and correspond to those updates which launch a redundant computa- 
tion (like the U\ event generated in step B of Example I4.2[) . We write DAT\a-cp to 
denote the set of arcs of the form H : CPq => [CPj] B : CP 2 £ Dep in the current 
dependency arc table which depend on A : CP, i.e., such that A : CP = (B : CP2)o 
for some renaming a. 

Definition 5.1 (redundant update) 

Let P € Prog, S a C AAtom and ft G QHS. We say that an event updated(A : CP) 
which appears in the prioritized event queue during the analysis of P for S a is 
redundant w.r.t. 51 if, when it is generated, DAT\a.cp = 0- 

It should be noted that redundant updates can only be generated by updated 
events for call patterns which belong to S a , i.e., to the initial set of call patterns. 
Otherwise, DAT\a:CP cannot be empty. Let us explain the intuition of this. The 
reason is that whenever an event updated (A : CP), A : CP ^ S a , is generated is 
because a rule for A has been completely analyzed. Hence, a corresponding call to 
insert _ANSWER_lNFO for A : CP (Il2"31in Algorithm!!]) has been done. If such a rule 
has been completely analyzed then all its arcs were introduced in the prioritized 
event queue. Observe that the first time that an arc is introduced in the queue 
is because a call to procedure new_CALL_pattern for A : CP occurred, i.e., a 
newcall(A : CP) event was analyzed. Consider the first event newcall for A : CP. 
If A : CP £ S a , then this event originates from the analysis of some other arc of 
the form H : CPq => [ CP i]A : CP for which A : CP has no entry in the answer 
table. Thus, the dependency H : CP => [CP '2} A : CP was added to DAT. Since 
dependencies are never removed from DAT, then any later updated event for A : CP 
occurs under the condition DAT\a-cp 7^ 0- Even if it is possible to fix the strategy 
and define an analysis algorithm which does not introduce redundant updates, we 
prefer to follow as much as possible the generic one. 

Example 5.2 
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In our running example U± is redundant for SI at the moment it is generated. 
However, since the event has been given low priority its processing is delayed until 
the end and, in the meantime, a dependency from it has been added. This causes 
the unnecessary redundant recomputation of the second arc ^2,2 for rectoy. □ 

Note that redundant updates are indeed events which if processed immediately 
correspond to "nops" . 

The following proposition ensures the correctness of using a queue handling strat- 
egy which assigns the highest priority to redundant updates. This result can be 



found in ( Hermenegildo et al. 2000), where it is stated that Analyze_f is correct 
independently of the order in which events in the prioritized event queue are pro- 
cessed. 

Proposition 5.3 

Let SI G QHS. Let SI' € QHS be a strategy which assigns the highest priority to any 
updated event which is redundant. Then, V P G Prog, D a G ADom, S a C AAtom, 

ANALYZE_F(5 Q , £2)=AnALYZE_f(S q , SI'). 



5.2 The Notion of Reduced Certificate 

As mentioned above, the notion of reduced certificate is directly related to the idea 
of recomputation in the program analysis graph. Now, we are interested in finding 
those entries A : CP G Entry in the answer table, whose analysis has launched the 
reprocessing of some arcs and hence recomputation has occurred. Certainly, the 
reprocessing of an arc may only be caused by a non-redundant updated event for 
A : CP, which inserted (via ADD_dependent_RULES) all arcs in DAT\a-.cp into 
the prioritized event queue. However some updated events are not dangerous. For 
instance, if the processing of an arc H : CPq => [CP 2} A : CP has been stopped 
because of the lack of answer for A : CP (I]2"01 and 1|2"51 in Algorithm [TJ , this arc 
must be considered as "suspended" , since its continuation has not been introduced 
in the queue. In particular, we do not take into account updated events for A : CP 
which are generated when DAT\a-.cp only contains suspended arcs. Note that this 
case still corresponds to the first traversal of any arc and should not be considered 
as a reprocessing. The following definition introduces the notion of suspended arc, 
i.e., of an arc suspended during analysis. 

Definition 5.4 (suspended arc) 

Let P G Prog, S a C AAtom and SI G QHS. We say that an arc H : CP => 
[CPi] B : CP 2 in the dependency arc table is suspended w.r.t. SI during the analysis 
of P for S a iff when it is generated, the answer table does not contain any entry 
for B : CP 2 or contains an entry of the form B : CP 2 *-> L. 

For the rest of the updated events, their relevance depends strongly on the strat- 
egy used to handle the prioritized event queue. For instance, assume that the pri- 
oritized event queue contains an event arc(H : CPq =>• \CPt\A : CP), coming from 
a suspended arc in DAT. If all updated events for A : CP are processed before this 
arc (i.e., the fixpoint of A : CP is available before processing the arc), then these 
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updated events do not launch any recomputation. Let us define now the notion of 
recomputation. 

Definition 5.5 (multi-traversed arc) 

Let P e Prog, S a C AAtom and fi e QHS. We say that an arc H : CP => [CP ]A : 
CP i in the dependency arc table has been multi-traversed w.r.t. f2 after the analysis 
of P for S a iff it has been introduced in the dependency arc table at least twice as 
a non suspended arc w.r.t. f2. 

Example 5.6 

Assume that we use a strategy il" 6 QHS such that step C in Example 14.21 is per- 
formed before B (i.e., the second rule is analyzed before the first one). Then, when 
the answer for rectoy(Nl, R):(int, term) is looked up, procedure GET_ANSWER re- 
turns _!_ and thus the processing of arc ^2,2 is suspended at this point in the sense 
that its continuation A2.3 is not inserted in the queue (see Ll20l in Algorithm [lj . 
Indeed, we can proceed with the remaining arc A\ \ which is processed exactly as 
in step B. In this case, the updated event Ui is not redundant for fi", as there 
is a suspended dependency introduced by the former processing of arc ^2,2 in the 
table. Therefore, the processing of U\ introduces the suspended arc A^p, again in 
the queue, and again ^2,2 is introduced in the dependency arc table, but now as 
not suspended. The important point is that the fact that U\ inserts ^2,2 must not 
be considered as a reprocessing, since ^2,2 had been suspended and its continuation 
(^2,3 in this case) had not been handled by the algorithm yet. Hence, finally ^2,2 
has not been multi-traversed. □ 

We define now the notion of relevant entry, which will be crucial for defining 
reduced certificates. The key observation is that those answer patterns whose com- 
putation has generated multi-traversed arcs should be available in the certificate. 

Definition 5.7 (relevant entry) 

Let P S Prog, S a C AAtom and f2 e QHS. We say that the entry A : CP H> AP 
in the answer table is relevant w.r.t. f2 after the analysis of P for S a iff there exists 
a multi-traversed arc _ => : CP w.r.t. ft in the dependency arc table. 

The notion of reduced certificate allows us to remove irrelevant entries from the 
answer table and produce a smaller certificate which can still be validated in one 
pass. 

Definition 5.8 (reduced certificate) 

Let P G Prog, S a C AAtom and SI e QHS. Let FCert = Analyze_f(S q , Q) for P 
and S a . We define the reduced certificate, RCert, as the set of relevant entries in 
FCert w.r.t. il. 

Example 5.9 

From now on, in our running example, we assume the strategy Vt 1 6 QHS which 
assigns the highest priority to redundant updates (see Proposition I5.3[) . For this 
strategy, the entry E[ = rectoy(N,M) : (int,term) (->• (int, int) in Example 14.21 is 
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not relevant since no arc has been multi-traversed. Therefore, the reduced certifi- 
cate for our running example is empty. In the following section, we show that our 
checker is able to reconstruct the fixpoint in a single pass from the empty certifi- 
cate. It should be noted that, using f2 as in Example 14.21 the answer is obtained 
by performing two analysis iterations over the arc associated to the second rule of 
rectoy(N, M) (steps C and D) due to the fact that U\ has been delayed and becomes 
relevant for J7. Thus, this arc has been multi-traversed. □ 

Consider now the Java version of the procedure rectoy, borrowed from (jRose 1 998): 

int rectoy(int n) { 
int m; int r; 
m=0; 

if (n > 0) { 
n= n-1; 

r = this.rectoy(n); 
m = n + 4; 



} 



return m; 



// Program point 30 



For this program, lightweight bytecode verification (LBV) (|Rose 1998ft sends, to- 
gether with the program, the reduced non-empty certificate cert = ({30 i-> (e, rectoy- 
int ■ int ■ _L)}, e), which states that at program point 30 the stack does not contain 
information (first occurrence of e)@ and variables n, m and r have type int, int and 
_L The need for sending this information is because rectoy, implemented in Java, 
contains an if-branch (equivalent to the branching for selecting one of our two 
clauses for rectoy). In LBV, cert has to inform the checker that it is possible for 
variable r at point 30 to be undefined, if the if condition does not hold. However, 
in our method this is not necessary because the checker is able to reproduce this 
information itself. Therefore, the above example shows that our approach improves 
on state-of-the-art PCC techniques by reducing the certificate even further while 
still keeping the checking process one-pass. 



5.3 Generation of Certificates without Irrelevant Entries 

In this section, we instrument the analyzer of Algorithm [1] with the extensions 
necessary for producing reduced certificates, as defined in Definition 15.81 Together 
with the answer table returned by Algorithm [TJ this new algorithm returns also 
the set RED (initially empty) of call patterns which will form finally the reduced 
certificate RCert. The resulting analyzer Analyze_r is presented in Algorithm [5] 
Except for procedure PROCESS_ARC and insert _ANSWER_lNFO, it uses the same 
procedures as Algorithm^ adapting them to the new syntax of arcs. Now, arcs will 
be annotated with an integer value u which counts the number of times that the 

2 The second occurrence of t indicates that there are no backward jumps. 
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arc has been traversed during the analysis. The first time that an arc is introduced 
in the prioritized event queue, it is annotated with 0. Thus, I I13I in Algorithm [T] 
must be replaced by: 

[13 add_event(arc(A fe (0) : CP [CP ] B k>1 : CPi),fi) 

Let us see the differences between Algorithm [5] and Algorithm [T] 

1. We detect all multi-traversed arcs. When a call to PROCESS_ARC is generated, 
this procedure checks if the arc is suspended fLlT3|) before introducing the 
corresponding arc in the dependency arc table. If the arc is suspended, then 
its u value is not modified, since, as explained before, it cannot be considered 
as a reprocessing. Otherwise, the u- value is incremented by one. Furthermore, 
if P>k : i is not a constraint and u is greater than 1, then Bk^:CP2 is added 
to the RED set, since this means that the arc has been multi-traversed. Note 
that the RED set will contain in the end those call patterns whose analysis 
launches the recomputation of some arc. 

Another important issue is how to handle the continuation of the arc which 
is being currently processed. If the arc is suspended, then no continuation 
is introduced in the queue (checked by L0] and Ij9]). Otherwise (rj4j) , before 
introducing the continuation in the queue, we check if the dependency arc 
table already contains such a continuation (Ij6j) . In that case, we add the 
arc with the same u annotation than that in the queue (LtTJ . Otherwise, we 
introduce the continuation as an arc initialized with (L|8]). 

2. We ignore redundant updates. Only non-redundant updates are processed by 
procedure INSERT _ANSWER_info (I j23|) . Each time an updated event is gener- 
ated, we check if DAT\h-cp is different from fl l23|) . Only then, an updated 
event for H:CP is generated fLl2"4|). 



Example 5.10 

Consider the four steps performed in the analysis of our running example. Step 
A is identical. In step B the insert _ANSWER_info procedure detects a redundant 
updated event (I l23|) . No updated event is generated. Step C remains identical and 
the arc A2.2 (the only one able to contribute to the RED set) is annotated with 1, 
and step D does not occur. As expected, upon return, the RED set remains empty. 
□ 

5.4 Correctness of Certification 

This section shows the correctness of the certification process carried out to gen- 
erate reduced certificates, based on the correctness of the certification with full 
certificates of (Al bert et al. 2008|) . First, note that, except for the control of rel- 
evant entries, Analyze_f(S' q ,, O) and Analyze_r(S' q , fi) have the same behavior 
and thus compute the same answer table. 
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Algorithm 2 Analyze_R: Analyzer instrumented for Certificate Reduction 



1: procedure PROCESS_ARC(iifc(M) : CP =>• [CPi] B k ,i : CP 2 G Dep,Q G QHS) 

2: W := vars(H k ,B ktl ,...,B ktnk ); 

3: CP 3 := GET_ANSWER(B fcii : CPi, CPi,W, fi); 

4: if CP3 7^ J. and i ^ n k then 

5: CP4 := Arestrict( CPs 

6: if there exists the arc H k (w) :_=>_: B k: i+i in 

the dependency arc table then 
7: addjevent(arc(# fc H : CP [CP 3 ] Bfe,i+i : CP 4 ),fi); 

8: else add_event(arc( J H" fe (0) : CP [CPs] B fe , l+ i : CP 4 ),fi); 

9: else if CP3 7^ _L and i = n k then 
10: APi := Arestr\ct(CP 3 ,vars(H k )); 

11: INSERT _ANSWER_INFO(-f/ : CPo H> APi,£2); 

12: if B kt i is not a constraint then 
13: if CPs = -1 then 

14: add H k (u): CPo=>[CPi] B k ^:CP 2 ) to dependency arc table; 

15: else % non-suspended arc 

16: add H k (u + l):CPo=>[CPi] B kJ :CP2 to dependency arc table; 

17: if u+l>l then add S M :CP 2 to RED; 

18: procedure iNSERT_ANSWBRjNFo(H : CP m> AP G Entry, Q € QHS) 

19: AP := lookup-Answer^ : CP, fi); 

20: APi := A\ub(AP, AP ); 

21: if ^P 7^ AP a then %updatcd required 

22: add_answer_table(//:CPn> J 4Pi); 

23: if DAT\h-.CP 7^ then % non-redundant updated 

24: add _event(updated(H : CP)); 



Proposition 5.11 

Let P e Prog, D a G ADom, S a C AAtom, 0,0' G Qffi>. Let AT be the answer 
table computed by Analyze_r(S' q ,, fi'). Then, Analyze_f(5 q , n) = AT. 

Proof 

First note that except for the it-annotations, the procedures PROCESS_ARC in Al- 
gorithms [T] and [2] are similar. In fact, there is a one-to-one correspondence between 
the definition of both procedures. Concretely, we have the following mapping: 



Analyze_f 

nnnm 

id 

una 



Analyze_r 

UT21T71 

10 

IS] 

Iffill 
IMIHH 



The only difference between Algorithms[T]and[2]relies on lNSERT_ANSWER_lNFO. For 
the case of Algorithm [2l redundant updates are never introduced in the prioritized 
event queue (I [23|) . Then, let us choose a new strategy f2", identical to f2' except 
when dealing with redundant updates. For redundant updates, let us assume that 
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f2" processes them inmcdiatcly after being introduced in the event queue. Such pro- 
cessing does not generate any effect since the dependency arc table does not contain 
arcs to be launched for these updates. Hence it holds that Analyze_r(5' q ,, f2') gen- 
erates the same answer table AT than Analyze_f(5 q , Cl"). From Proposition 15.31 
it holds that ANALYZE_F(S a , fi)=ANALYZE_F(,5' ct , ft") and the claim follows. □ 

The following definition presents the certifier for reduced certificates. 

Definition 5.12 

We define the function Certifier_r: Progx ADomx2 AAtom xAPolx QHS^ACert, 
which takes P G Prog, D a G ADom, S a C AAtom, I a G APol, fl G QHS. It 
returns as certificate, RCert = {A : CP H> AP G FCert | A : CP G RED}, where 
(FCert, RED) =Analyze_r(S' q , O), iff FCert C I a . 

Finally, we can establish the correctness of Certifier_r which amounts to say that 
RCert contains all relevant entries in FCert. 

Theorem 5.13 

Let P G Prog, D a G ADom, S a C AAtom, I a G APol and tt G C^PS. Let 
FCert =Analyze_f(S' q , Q) and RCert= Cerhfier_r(P, D a , S a , la, n). Then, an 
entry A : CP M> AP G FCert is relevant w.r.t. Vl iff A : CP M> G RCert. 

Proof 

According to Definition EH RCert = {A : CP M> AP G FCert A : CP G RED}, 
where (FCert, RED) =Analyze_r(5 , Q! , Q). Hence, it is enough to prove that an entry 
A : CP H> AP G FCert is relevant w.r.t. ft iff A : CP G RED. 

(<=) Assume that A : CP G RED. Then, from HTOland LfTTlit holds that there exists 
an arc H{u) : CP'=>[-]A : CP in the dependency arc table such that u > 1. But 
the u-value of an arc can only be increased in procedure PROCESS_ARC (I I16I) after 
checking that CP 3 is different from _L (I JT51) . But CP 3 is computed by means of 
GET .ANSWER (L[3j) which calls lookup^nswer (I j27)) . This last function only 
returns a value different from ± if A : CP as an entry in the answer table (Il30l 
and U31[) . Since u > 1 then u has been incremented at least twice and as argued 
before, in both cases the answer table contained an entry for A : CP, i.e., by 
Definition 15.51 the arc H : CP' =>■ : CP is multi-traversed w.r.t VI. Hence, by 
Definition 15.71 A : CP M> A P is a relevant entry. 

(=>) Assume now that the entry A : CP 1— > AP is relevant w.r.t J7. Then, by 
Definition 15. 71 there exists an arc H : CP' => [_]A : CP in the dependency 
arc table which has been multi-traversed. By Definition 15.51 this arc has been 
introduced in the dependency arc table at least twice as non-suspended arc. But 
arcs are introduced in DAT via procedure PROCESS_ARC and each time the arc 
is non suspended (I [T5)) its u- value is increased by 1 (I II!))) . Hence the u value for 
H : CP' => [J A : CP is at least 2. Now, ITTT] ensures that A : CP G RED. 

□ 
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6 Checking Reduced Certificates 

In the ACC framework for full certificates (jAlbert et al. 2005f a concrete checking 
algorithm is used with a specific graph traversal strategy which we will refer to as 
Qc- This checker has been shown to be very efficient (i.e., this particular ilc is 
a good choice) but here we would like to consider a more generic design for the 
checker in which it is parametric on fip in addition to being parametric on the 
abstract domain|f| This lack of parametricity on Vic wa s not an issue in the original 
formulation of ACC in (jAlbert et al. 2005|) since there full certificates were used. 
Note that even if the certifier uses a strategy fi a which is different from Q,c, all 
valid full certificates are guaranteed to be validated in one pass by that specific 
checker, independently of ftc- This result allowed using a particular strategy in the 
checker without loss of generality. However, the same result does not hold any more 
in the case of reduced certificates. In particular, completeness of checking is not 
guaranteed if ft a ^ ^c- This occurs because, though the answer table is identical 
for all strategies, the subset of redundant entries depends on the particular strategy 
used. The problem is that, if there is an entry A : CP H> AP in FCert such that it 
is relevant w.r.t. flc but it is not w.r.t. Qa> then a single-pass checker will fail to 
validate the RCert generated using Q^. In this section, we design a generic checker 
which is not tied to a particular graph traversal strategy. In practice, upon agreeing 
on the appropriate parameters, the consumer uses the particular instance of the 
generic checker resulting from the application of such parameters. In a particular 
application of our framework, we expect that the graph traversal strategy is agreed 
a priori between consumer and producer. Alternatively, if necessary (e.g., when the 
consumer does not implement this strategy), the strategy can be sent along with 
the certificate in the transmitted package. 

It should be noted that the design of generic checkers is also relevant in light of 
current trends in verified analyzers (e.g., ( |Klein and Nipkow 2003||Cachera et al. 2 004)). 
which could be transferred directly to the checking end. In particular, since the de- 
sign of the checking process is generic, it becomes feasible in ACC to use automatic 
program transformation techniques (jJones et al. 1993|) to specialize a certified (spe- 
cific) analysis algorithm in order to obtain a certified checker with the same strategy 
while preserving correctness and completeness. 

6.1 The Generic Checking Algorithm 

The following definition presents a generic checker for validating reduced certifi- 
cates. In addition to the genericity issue discussed above, an important difference 
with the checker for full certificates (Albert et al. 2 005) is that there are certain en- 
tries which are not available in the certificate and that we want to reconstruct and 
output in checking. The reason for this is that the safety policy has to be tested w.r.t. 
the full answer table -Equation ((2|). Therefore, the checker must reconstruct, from 

3 Note that both the analysis and checking algorithms are always parametric on the abstract 
domain. This genericity allows proving a wide variety of properties by using the large set of 
available abstract domains, this being one of the fundamental advantages of ACC. 
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RCert, the answer table returned by Analyze_f, FCert, in order to test for adher- 
ence to the safety policy -Equation (J4j) . Note that reconstructing the answer table 
does not add any additional cost compared to the checker in (j Albert et al. 2005]) . 
since the full answer table also has to be created in (|Albert et al. 2 005). 



Algorithm 3 Generic Checker for Reduced Certificates Checking_R 



1: procedure insert _answer_info(/PCPi-s>AP G Entry, SI 6 QHS) 

2: APo := LOOKUP-ANSWER^: CP, fi); 

3: APi := A\ub(AP, AP ); 

4: (7s/n,AP')=LOOK_FlXPOlNT(_H':CP, RCert); 

5: if Isln and Alub(j4P, AP) ^ AP' then return error; % error of type a) 

6: if APo^APi then % updated required 

7: if Isln and AP =± then APi=AP' 

8: add_answer_table(//:CPi~>APj ); 

9: if DAT\ H[C p /=% then 

10: add _event(updated(H:CP), 57); 

11: function LOOK_FIXPOINT(yl: CP G AAtom, RCert G ACert) 

12: if 3 a renaming a such that a(A:CPt-^AP)£ RCert then 
13: return (true,o- _1 (AP)); 

14: else return (false, J_); 



Definition 6.1 {checker for reduced certificates) 

Function Checking_r is defined as function Analyze_r with the following modi- 
fications: 

1. It receives RCert as an additional input parameter. 

2. It does not use the set RED and it replaces LTTTlof Algorithm [5] with: 

[T7l If u+l>l return error 

3. If it fails to produce an answer table, then it issues an error. 

4. Function insert _ANS\ver_info is replaced by the new one in Algorithm [3] 

Function Checker_r takes P 6 Prog, D a e ADom, S a C AAtom, I a e APol, 
ft £ QHS, RCert 6 ACert and returns: 

1. error if CHECKlNG_R(S' a , fl, RCert) for P in D a returns error. 

2. Otherwise it returns FCert=CHECKING_R(5 a , J2, RCert) for P and D a iff FCert C 

la- 

Let us briefly explain the differences between Algorithms [2] and [3] First, the checker 
has to detect (and issue) two sources of errors: 

a) The answer in the certificate and the one obtained by the checker differ 
(L[5]). This is the traditional error in ACC and means that the certificate 
and program at hand do not correspond to each other. The call to function 
LOOK_FixPOiNT(ff : CP, RCert) in L|4] returns a tuple (IsIn,AP r ) such that: 
if H : CP is in RCert, then Isln is equal to true and AP' returns the fixpoint 
stored in RCert. Otherwise, Isln is equal to false and AP 1 is !_. 
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b) Recomputation is required. This should not occur during checking, i.e., no 
arcs must be multi-traversed by the checker (LfTTJ) . This second type of error 
corresponds to situations in which some non-redundant update is needed in 
order to obtain an answer (it cannot be obtained in one pass). This is detected 
in L fTTl prior to check that the arc is not suspended (Ij9]) and it has been 
traversed before, i.e., its u value is greater than 1. Note that we flag this as an 
error because the checker will have to iterate and the description we provided 
does not include support for it. In general, however, it is also possible to use 
a checker that is capable of iterating. In that case of course the certificates 
transmitted can be even smaller than the reduced ones, at the cost of increased 
checking time (as well as some additional complexity in the checking code). 
This allows supporting different tradeoffs between certificate size, checking 
time, and checker code complexity. 

The second difference is that the A : CP H> AP' entries stored in RCert have to be 
added to the answer table after finding the first partial answer for A : CP (different 
from _L), in order to detect errors of type a) above. In particular, I[7]and L|8]add 
the fixpoint AP' stored in RCert to the answer table. 

Example 6.2 

All steps given for the analysis of Example l5.10l are identical in Checker_r except 
for the detection of possible errors. Errors of type a) are not possible since RCert is 
empty. An error of type b) can only be generated because of the u value of arc 
^2,2- However note that in step C, this arc is introduced in the queue with u = 0. 
After processing the arc, the arc goes to the dependency arc table with u = 1. But 
since no updated events are generated, this arc is no longer processed. Hence, the 
program is validated in a single pass over the graph. □ 

6.2 Correctness of Checking 

In this section we prove the correctness of the checking process, which amounts to 
saying that if Checker_R does not issue an error when validating a certificate, then 
the reconstructed answer table is a fixpoint verifying the given input safety policy. 
As a previous step, we prove the following proposition in which we also ensure that 
the validation of the certificate is done in one pass. 

Proposition 6.3 

Let P G Prog, D a G ADom, S a C AAtom, I a £ APol and £1 G QHS. Let FCert= 
Certifier_f(P, D a , S a ,I a ,n), RCert=CERTiFiER_R(P, D a , S a , I a , Q). Then Check- 
iNG_R(5' a , ft, RCert) does not issue an error and it returns FCert. Furthermore, the 
validation of FCert does not generate multi-traversed arcs. 

Proof 

Let us consider first the call: 

(*) Checking_r(S q , ft, RCert) 
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For this call, let us prove that (1) it does not issue an error and; (2) it returns FCert 
as result. 



(1) Checking_r(£ q , fl, RCert) does not issue an error. 

Errors of type (a) (I[5]of Algorithm [3J are not possible since, from Definition 15. 121 
RCert C FCert, where FCert is the answer table computed by Analyze_f(S' q , fi). 
The correctness of Algorithm Analyze_f(5 q , fl) (see ( Hermenegildo et al. 2000[ )) 
avoids this kind of errors. 

Errors of type (b) can only occur in IJTTlof procedure PROCESS_ARC (Algorithm 
[3]), for some arc H : CPo [CPj]Bk.i : CP Since we follow the same strategy 

in Checking_R and Analyze_R, then Analyze_r(S' q , f2) introduces B kii : CP 2 
in RED ( IJTTlof Algorithm ^ , and thus, Definition [57121 ensures that B k>i : CP 2 i-4 
AP G RCert. But this is a contradiction since for all entries in RCert, the first 
time that the arc is processed without answer in AT for B k .i : CP 2, Algorithm [3] 
(IiTJand l[8]) introduces B k ,i : CP 2 AP in AT together with the corresponding 
event updated(Bk t i : CP 2)- So when £1 selects this event, the new event arc(H : 
CPo => [CPi]Bk ; i : CP 2) is again introduced in the prioritized event queue. When 
this arc is selected by fi, the arc goes again to DAT. But since B^^ : CP2 
AP 6 AT, no more events of the form updated(Bk,i ■ CP2) may occur (l[6] of 
insert _ANSWER_lNFO(£fc,i : CP2) in Algorithm [3] never holds). Hence, no more 
calls to process arc for arc(H : CPo => [CPi]Bj, i : CP 2 ) occur. Then the it-value 
for this arc will be at most 1 and no error will be generated. 



(2) The call (*) returns FCert. 



The only differences between the call (*) and the call Analyze_r(5' q ,, Q) rely on pro- 
cedure INSERT _ANSWER_info and iTlTlof procedure PROCESS_ARC. Since (1) ensures 
that no error is issued by (*), then I[S]and IJTTlof Algorithm [3] are never executed. 
Then, it is trivial that (1) computes an answer table AT as result. Furthermore, 
since (*) and Analyze_r(5 , 0! , Vi) use the same strategy, the only difference is in the 
prioritized event queue since for (*) no relevant updates will appear in the queue. 
Instead of this, the real fixpoints in RCert C FCert are introduced in AT in 10 
and IJ5]of insert _ANSWER_info. Except for this fact, Algorithms [2] and [3] behave 
identically and thus (*) computes FCert as result. 

Finally, proving that the validation of RCert does not generate multi-traversed 
arcs is trivial since, by definition, multi-traversed arcs correspond to arcs in DAT 
with the u- value greater than 1. Since the call (*) does not issue an error, IJTTlof 
Algorithm [3] is never executed, i.e., no arc is multi-traversed. 
□ 



Corollary 6.4 

Let P e Prog, D a e ADom, S a C AAtom, I a e APol and Q e QHS. Let 
FCert=CERTiFiER_F(P, D a ,S a , I a ,£l), and RCertn=CERTiFiER_R(P, D a ,S a ,Ia,Q)- 
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If Checking_r(5' q , , ft, RCert), RCert G ACert, does not issue an error, then it re- 
turns FCert and RCertn C RCert. Furthermore, the validation of FCert does not 
generate multi-traversed arcs. 

Proof 

Let us prove, by contradiction, that RCertn C RCert. If we assume that RCertn % 
RCert, then there exists an entry A : CP i— > AP G RCertn such that A : CP h-> 
AP RCert. By definition of RCertn, A : CP G RED. Hence, HTUof Algorithm [2] 
ensures that there exists an arc H : CPo(u) => [CPi]A : CP in DAT with u > 1. 
But this is not possible since otherwise the call Checking_r(S' q ,, ft, RCertn) would 
issue an error, what is a contradiction by Proposition [^3] 

Now observe that from Proposition ^. 5l it holds that Checking_r(5' q , ft, RCertn) 
returns FCert and the validation of RCertn does not generate multi-traversed arcs. 
But since RCertn Q RCert, then it trivially holds that Checking_r(S q , ft, RCert) 
also returns FCert exactly in the same way that Checking_r(S , q , ft, RCertn) does, 
i.e., without generating multi-traversed arcs. □ 

Theorem 6.5 {correctness) 

Let P G Prog, D a G ADom, S a C AAtom, I a G APol, ft G QHS and RCert G 
ACert. Then, if Checker_r(P, D a , S a , I a , ft, RCert) does not issue an error and 
returns a certificate FCert G ACert, then 

• FCert is a fixpoint of P. 

• FCert C I a ; 

Proof 

If Checker_r(P, D a , S a , la, ft, RCert) does not issue an error then, from Defini- 
tion l6.ll it holds that FCert =CHECKlNG_R(<S' a , ft, RCert) does not issue an error and 
FCert C I a . From CorollarvlOl it follows that FCert =Certifier_f(P, D a , S a ,I a ,fl) 
Hence, as Definition 14.31 establishes. FCert is the answer table computed by Ana- 
LYZE_F(S' a , ft'). Finally, by the results in ( |Hermenegildo et al. 2000[ ), FCert is a 
fixpoint for P. 
□ 

6.3 Completeness of Checking 

The following theorem (completeness) provides sufficient conditions under which a 
checker is guaranteed to validate reduced certificates which are actually valid. In 
other words, if a certificate is valid and such conditions hold, then the checker is 
guaranteed to validate the certificate. Note that it is not always the case when the 
strategy used to generate it and the one used to check it are different. 
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Theorem 6.6 {completeness) 

Let P e Prog, D a G ADom, S a C AAtom, I a G APol and Ha G Let FCert= 

Certifier_f(P, D , S , I a> ClA) an d RCertn A = Certifier_r(P, D a ,S a , I a , Ha). Let 
ficG CPS' be such that RCertf2 c = Certifier_r(P, D a , S a ,I a ,H c ) and RCertn A D 
RCertfj c . Then, Checker_r(P, D a , S a , He, RCertn A ) returns FCert and does not 
issue an error. 

Proof 

We prove it by contradiction. The only cases in which Checker_r(P, D a , S a , 
He, RCertQ A ) issues an error are the following: 

• The partial answer AP computed for some calling pattern A : CP (provided in 
RCertn J leads to Alub(AP, AP') ^ AP' (H5J), where AP' is the answer for A : CP, 
i.e., A : CP i— > AP' G RCerto A . But, RCerto A C FCert, i.e., FCert would contain 
an incorrect answer for A : CP, which is a contradiction with the assumption that 
FCert is a valid certificate for P. 

• There exists some arc H : CP =>■ [CP{\B : CP 2 which has been traversed more 
than once, i.e., its u- value is greater than 1 (LfTTlin Algorithm [3]). Since RCert^ c C 
RCerto A , i.e., RCertji c contains possibly less entries than RCertn A , then the call 
(*) in Theorem 16.51 fails also because of such a multi-traversed arc. But this is a 
contradiction with (1) in Theorem 16.51 

Consequently, Checker_r(P, D, S, flc, RCertn A ) returns an answer table AT. Fi- 
nally, by Theorem 16. 5[ we know that since no error is issued, then Checker_r re- 
turns FCert. □ 

Obviously, if fie = Ha then the checker is guaranteed to be complete. Addition- 
ally, a checker using a different strategy He is also guaranteed to be complete as 
long as the certificate reduced w.r.t He is equal to or smaller than the certificate 
reduced w.r.t Ha - Furthermore, if the certificate used is full, the checker is complete 
for any strategy. Note that if RCerto 4 2 RCertji c , Checker_R with the strategy 
He may fail to validate RCertn A , which is indeed valid for the program under Ha- 

Example 6.7 

Consider the program of Example 13.11 the same abstract domain D a than in our 
running example, and the call pattern S a — {q(X):(term)}: The full certificate com- 
puted by Certifier_f is FCert = {q(X):(term) h-» (real), p(X):(term) H> (real)}. 
Let us consider two different queue handling strategies Ha ^ He - Under both strate- 
gies, we start the analysis introducing q(X):(term) h-> _|_ in the answer table and 
processing the single rule for q. The arc q(X)(0):(term)=>[{X/term}] p(X):(term) 
is introduced in the queue and processed afterward. As a result, q(X)(0):(term) 
=>[{X/term}] p(X):(term) goes to DAT and event newcall(p(X) :(tenn)) is gener- 
ated. The processing of this last event adds p(X):(term) n> _L to the answer ta- 
ble. Now, using Ha, the analyzer processes both rules for p(X) in textual order. 
None of the arcs introduced in DAT can issue an error. After traversing the first 
rule, answer p(X):(term) h4 (real) is inferred and non-redundant updated event 
wpdated(p(X):(term)) is generated. The analysis of the second rule produces as 
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answer (int) and does not update the entry since Alub({X/real}, {X/int}) re- 
turns {X/real}. We process the non-redundant update for p by calling function 
ADD_DEPENDENT_RULES. The arc for q stored in the dependency arc table with 
is launched. When processing this arc, again the arc is introduced in DAT with 
u = 1, and the answer q(X):(term) h-> (real) replaces the old one in the answer 
table. Since RED is empty, then RCertn A is empty. 

Assume now that Q,q assigns a higher priority to the second rule of p. In this case, 
the answer for p(X):(term) changes from _L to {X/int}, producing a non-redundant 
update. Suppose now that the updated event is processed, which launches the arc 
for q stored in DAT . If we process such an arc, then it will be introduced again in 
DAT, but now with u = 1. Answer q(X):(term) H» {X/int} is inserted in the answer- 
table. When the first arc for p is processed, the computed answer is {X/real}. Now, 
a new non-redundant updated event is needed. The processing of this update event 
launches again the arc for q stored in DAT, whose analysis introduces it in DAT 
with u = 2. 

Hence RCertji 4 is empty but RCerto c contains the single entry p(X):(term) i-> 
(real). Thus, Checker_r(P, D a , S a , fi c , RCerto A ) will issue an error f UTT)) when 
trying to validate the program if provided with the empty certificate RCert^ A . On 
the contrary, by Theorem 16. 6\ Checker_r(P, D a , S a , RCerto c , Ha) returns FCert 
and does not issue an error. This justifies the results intuitively shown in Section [3] 
□ 



7 Discussion and Experimental Evaluation 

As we have illustrated throughout the paper, the reduction in the size of the cer- 
tificates is directly related to the number of updates (or iterations) performed dur- 
ing analysis. Clearly, depending on the "quality" of the graph traversal strategy 
used, different instances of the generic analyzer will generate reduced certificates 
of different sizes. Significant and successful efforts have been made during recent 
years towards improving the efficiency of analysis. The most optimized analyzers 
actually aim at reducing the number of updates necessary to reach the final fix- 



point (Puebla and Hermenegildo 19961. Interestingly, our framework greatly bene- 
fits from all these advances, since the more efficient analysis, the smaller the cor- 
responding reduced certificates. We have implemented a generator and a checker 
of reduced certificates as an extension of the efficient, highly optimized, state-of- 
the-art analysis system available in CiaoPP. Both the analysis and checker use the 



optimized depth-first new-calling QHS of QPuebla and Hermenegildo 1996 1 



In our experiments we study two crucial points for the practicality of our pro- 
posal: the size of reduced vs. full certificates (Table I7.1j) and the relative effi- 
ciency of checking reduced vs. full certificates (Table I7.2j) . As mentioned before, 
the algorithms are parametric w.r.t. the abstract domain. In all our experiments 
we use the same implementation of the domain-dependent functions of the shar- 
ing+freeness QMuthukumar and Herm eiicgild oTMTl ) abstract domain. We have se- 
lected this domain because it is highly optimized and also because the information 
it infers is very useful for reasoning about instantiation errors, which is a cru- 
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4. 


.411 


288 


288 


1 


.000 





.301 


grammar 


1598 


3182 


1. 


.991 


1259 


40 


31 


.475 





.025 


hanoiapp 


1172 


2264 


1. 


.932 


2325 


880 


2 


.642 





.751 


occur 


1367 


6919 


5. 


.061 


1098 


666 


1 


.649 





.487 


progeom 


1619 


3570 


2. 


.205 


2148 


40 


53 


.700 





.025 


qsortapp 


664 


1176 


1. 


.771 


2355 


650 


3 


.623 





.979 


query 


2090 


8818 


4. 


.219 


531 


40 


13 


.275 





.019 


rdtok 


13704 


15423 


1. 


.125 


6533 


2659 


2 


.457 





.194 


rectoy 


154 


140 





.909 


167 


40 


4 


.175 





.260 


serialize 


987 


3801 


3 


.851 


1779 


1129 


1 


.576 


1 


.144 


zebra 


2284 


5396 


2. 


.363 


4058 


40 


101 


.450 





.018 



Overall 2.17 3.35 0.28 



Table 1. Size of Reduced and Full Certificates 

cial aspect for the safety of logic programs. Furthermore, as mentioned previously, 
sharing domains have also been shown to be useful for checking properties of im- 
perative programs, including for example information flow characteristics of Java 
bytecode ( [Seed and Spoto 2 005 Gc naim and Spoto 2005[ ). On the other hand, we 
have used T as call patterns in order to get all possible modes of use of predicate 
calls. 

The whole system is written in Ciao (jBueno et al. 2009|) and the experiments 
have been run using version 1.13r5499 with compilation to bytecode on a Pentium 
4 (Xeon) at 2 Ghz and with 4 Gb of RAM, running GNU Linux Fedora Core-2 
2.6.9. 

A relatively wide range of programs has been used as benchmarks. They are the 
same ones used in ( |Hermenegildo et al. 2000| I Albert et al. 2005"]) . where they are 
described in more detail. 

7. 1 Size of Reduced Certificates 

Table [7TT1 shows our experimental results regarding certificate size reduction, coded 
in compact (fastread) format, for the different benchmarks. It compares the size 
of each reduced certificate to that of the full certificate and to the corresponding 
source code for the same program. 

The column Source shows the size of the source code and ByteC its corre- 
sponding bytecode. To make this comparison fair, in column BC/S we subtract 
4180 bytes from the size of the bytecode for each program: the size of the byte- 
code for an empty program in this version of Ciao (minimal top-level drivers and 



30 



Program 


C F 


Cr 


Cf/Cr 


aiakl 


85 


86 


0.986 


bid 


46 


48 


0.959 


browse 


20 


20 


0.990 


deriv 


28 


27 


1.038 


grammar 


14 


14 


1.014 


hanoiapp 


31 


30 


1.033 


occur 


18 


20 


0.911 


progeom 


17 


16 


1.012 


qsortapp 


24 


19 


1.290 


query 


13 


14 


0.917 


rdtok 


59 


56 


1.061 


rectoy 


8 


9 


0.909 


serialize 


27 


30 


0.875 


zebra 


125 


129 


0.969 


Overall 






0.99 



Table 2. Comparison of Checking Times 

exception handlers for any executable) . The size of the certificates is showed in the 
following columns. The columns FCert and RCert contain the size of the full and 
reduced certificates, respectively, for each benchmark, and they are compared in the 
next column (F/R). Our results show that the reduction in size is quite significant 
in all cases. It ranges from 101.45 in zebra (RCert is indeed empty -the minimum 
size of an empty certificate is 40 bytes- whereas FCert is 4058) to 1 for deriv (both 
certificates have the same size). 

The last column (R/S) compares the size of the reduced certificate to the source 
code (i.e., the size of the final package to be submitted to the consumer). The 
results show the size of the reduced certificate to be very reasonable. It ranges from 
0.018 times the size of the source code (for zebra) to 1.144 (in the case of serialize). 
Overall, it is 0.28 times the size of the source code. We consider this satisfactory 
since in general (C)LP programs are quite compact (up to 10 times more compact 
than equivalent imperative programs). 

7.2 Checking Time of Reduced Certificates 

Table 17.21 presents our experimental results regarding checking time. Execution 
times are given in milliseconds and measure runtime. They are computed as the 
arithmetic mean of five runs. For each benchmark, columns Cp and Cr are the 
times for executing Checker_f and Checker_r, respectively. Column Cf/Cr 
compares both checking times. These times show that the efficiency of Checker_R 
is very similar to that of Checker_f in most cases. 

The last row (Overall) summarizes the results for the different benchmarks using 
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a weighted mean which places more importance on those benchmarks with rela- 
tively larger certificates and checking times. We use as weight for each program its 
actual checking time. We believe that this weighted mean is more informative than 
the arithmetic mean, since, for example, doubling the speed in which a large and 
complex program is checked is more relevant than achieving this for small, simple 
programs. As mentioned before, the efficiency of the checker for reduced certificates 
is very similar to that of Checker_f (the overall slowdown is 0.99). 

8 Related Work 

A detailed comparison of the technique of ACC with related methods can be found 
in (| Albert et al. 2008)) . In this section, we focus only on work related to certificate 
size reduction in PCC. The common idea in order to compress a certificate in the 
PCC scheme is to store only the analysis information which the checker is not able 
to reproduce by itself ( |Leroy 2003[ ). In the field of abstract interpretation, this is 
known as fixpoint compression and it is being used in different contexts and tools. 
For instance, in the Astree analyzer (jCousot et al. 2005)) designed to detect runtime 
errors in programs written in C, only one abstract element by head of loop is kept 
for memory usage purposes. Our solution is an improvement in the sense that some 
of these elements many not need to be included in the certificate (i.e., if they are not 
relevant). In other words, some loops do not require iteration to reach the fixpoint 
and our technique detects this. 

With our same purpose of reducing the size of certificates, Necula and Lee (|Necula and Lee 1998p 
designed a variant of the Edinburgh Logical Framework LF ( |Harper et al. 1 993), 
called LFi, in which certificates (or proofs) discard part of the information that is 
redundant or that can be easily synthesized. LF^ inherits from LF the possibility of 
encoding several logics in a natural way but avoiding the high degree of redundancy 
proper of the LF representation of proofs. In the producer side, the original certifi- 
cate is an LF proof to which a representation algorithm is applied. On the consumer 
side, LFi proofs are validated by using a one pass LF type checker which is able 
to reconstruct on the fly the missing parts of the proof in one pass. Experimental 
results for a concrete implementation reveal an important reduction on the size of 
certificates (w.r.t. LF representation proofs) and on the checking time. Although 
this work attacks the same problem as ours the underlying techniques used are 
clearly different. Furthermore, our certificates may be considered minimal, whereas 
in (|Necula and Lee 1998|) . redundant information is still left in the certificates in 
order to guarantee a more efficient behaviour of the type checker. 

A further step is taken in Oracle-based PCC (jNecula and Ra hul 2001). This is 
a variation of the PCC idea that allows the size of proofs accompanying the code 
to adapt to the complexity of the property being checked such that when PCC 
is used to verify relatively simple properties such as type safety, the essential in- 
formation contained in a proof is significantly smaller than the entire proof. The 
proof as an oracle is implemented as a stream of bits aimed at resolving the non- 
deterministic interpretation choices. Although the underlying representations and 
techniques are different from ours, we share with this work the purpose of reducing 
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the size of certificates by providing the checker with the minimal information it 
requires to perform a proof and the genericity which allows both techniques to deal 
with different kinds of properties beyond types. 

The general idea of certificate size reduction has also been deployed in lightweight 
bytecode verification (LBV) ([Rose 19981 IRose 2003p . LBV is a practical PCC ap- 
proach to Java Bytecode Verification ( |Leroy 2003] ) applied to the KVM (an em- 
bedded variant of the JVM). The idea is that the type-based bytecode verification 
is split in two phases, where the producer first computes the certificate by means 
of a type-based dataflow analyzer and then the consumer simply checks that the 
types provided in the code certificate are valid. As in our case, the second phase 
can be done in a single, linear pass over the bytecode. However, LBV is limited to 
types while ACC generalizes it to arbitrary domains. Also, ACC deals with multi- 
variance with the associated accuracy gains (while LBV is monovariant). Regarding 
the reduction of certificate size, our work characterizes precisely the minimal infor- 
mation that can be sent for a generic algorithm not tied to any particular graph 
traversal strategy. While the original notion of certificate in (|Rose 1998j) includes 
the complete entry solution with respect to each basic block, (|Rose 2003|l reduces 
certificates by sending information only for "backward" jumps. As we have seen 
through our running example, (Rose 2003) sends information for all such backward 
jumps while our proposal carries the reduction further because it includes only the 
analysis information of those calls in the analysis graph whose answers have been 
updated, including both branching and non-branching instructions. We believe that 
our notion of reduced certificate could also be used within Rose's framework. 

As a final remark, the main ideas in ACC showed in Equations [2] and |4] in Sec- 
tion [2] have been the basis to build a PCC architecture based on certified abstract 
interpretation in (jBesson et al. 2006|) . Therefore, this proposal is built on the ba- 
sics of ACC for certificate generation and checking, but relies on a certified checker 
specified in Coq (jBarras et al. 1997}) in order to reduce the trusted computing base. 
In contrast to our framework, this work is restricted to safety properties which hold 
for all states and, for now, it has only been implemented for a particular abstract 
domain. 
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